Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[blog] Introducing inf2 runtime blog post #540

Merged
merged 1 commit into from
Feb 1, 2024

Conversation

spillai
Copy link
Contributor

@spillai spillai commented Feb 1, 2024

Summary

  • Added blog post for inf2 runtime
  • Updated README with cleaner API / usage and value-props

Related issues

Checks

  • make lint: I've run make lint to lint the changes in this PR.
  • make test: I've made sure the tests (make test-cpu or make test) are passing.
  • Additional tests:
    • Benchmark tests (when contributing new models)
    • GPU/HW tests

@spillai spillai added docs Indicates a need for improvements or additions to documentation blog inf2 labels Feb 1, 2024
@spillai spillai self-assigned this Feb 1, 2024
Copy link
Contributor

@outtanames outtanames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good.


[AWS Inferentia2](https://aws.amazon.com/en/ec2/instance-types/inf2/) (Inf2 for short) is the second-generation inference accelerator from AWS. Inf2 instances raise the performance of Inf1 (originally launched in 2019) by delivering 3x higher compute performance, 4x larger total accelerator memory, up to 4x higher throughput, and up to 10x lower latency. Inf2 instances are the first inference-optimized instances in Amazon EC2 to support scale-out distributed inference with ultra-high-speed connectivity between accelerators.

Relative to the [AWS G5 instances](https://aws.amazon.com/ec2/instance-types/g5/) ([NVIDIA A10G](https://www.nvidia.com/en-us/data-center/products/a10-gpu/)), Inf2 instances promise up to 50% better performance-per-watt. Inf2 instances are ideal for applications such as natural language processing, recommender systems, image classification and recognition, speech recognition, and language translation that can take advantage of scale-out distributed inference.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we quote the inf2 numbers here against say an A100 for reference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do this once we have the profiling numbers to compare, I could only find this stat relative to G5.


## 📦 Deploying a model on Inferentia2 with NOS

Deploying models on AWS Inferentia2 chips presents a unique set of challenges, distinctly different from the experience with NVIDIA GPUs. This is primarily due to the lack of a mature toolchain for compiling, profiling, and deploying models onto these specialized ASICs. To effectively utilize the AWS Inferentia2 chips, custom model tracing and compilation are essential steps. This process demands a deep understanding of the deployment toolchain, including PyTorch IR op-support and the [AWS Neuron SDK](https://github.com/aws-neuron/aws-neuron-sdk), to optimize model performance fully. NOS aims to bridge this gap and streamline the deployment process, making it more accessible for developers to leverage the powerful inference capabilities of AWS Inferentia2 for their inference workloads and expose easy-to-use gRPC/RESTful services in a straightforward manner.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] distinctly different -> distinct. Also wouldn't say 'lack of mature toolchain', maybe just point out that the Neuron SDK is very differnt from the torch cuda ecosystem.

| Model | Cloud Instance | Spot | Cost / hr | Cost / month | # of Req. / $ |
| ----- | -------------- | ---- | --------- | ------------ | ---------- |
| [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) | `inf2.xlarge` | - | $0.75 | ~$540 | ~685K / $1 |
| **[BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5)** | `inf2.xlarge` | ✅ | **$0.32** | **~$230** | ~1.6M / $1 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@spillai spillai merged commit 47a0776 into autonomi-ai:main Feb 1, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blog docs Indicates a need for improvements or additions to documentation inf2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants